Streaming Graph-Parallel Algorithms for Dynamic Community Detection using Spark GraphX
نویسندگان
چکیده
In this paper, we present a new streaming model for Graph-parallel community detection in dynamic social network using Spark GraphX tools on clouds. Two graph algorithms: SLP (streaming label propagation) and SGA (streaming genetic algorithm), are streamlined for Graphparallel execution in the SparkX execution environment. We developed a new streaming pipeline model for GraphXparallel execution. Computational complexity are derived for the SLP and SGA algorithms. They compare very favorably over the conventional non-streaming label propagation (LP) and genetic algorithm (GA) graph algorithms by two to three orders of magnitude, when the social graph exceed 10 millions of edges. The improvement in processing performance scales well with the social graph size and with the cloud machine instances used in the graph-parallel execution pipeline. Keywords— Social Networks, Graph Analytics, Community Detection, and Big Data Applications.
منابع مشابه
GraphX: Graph Processing in a Distributed Dataflow Framework
In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recover...
متن کاملSPARQL over GraphX
The ability of the RDF data model to link data from heterogeneous domains has led to an explosive growth of RDF data. So, evaluating SPARQL queries over large RDF data has been crucial for the semantic web community. However, due to the graph nature of RDF data, evaluating SPARQL queries in relational databases and common data-parallel systems needs a lot of joins and is inefficient. On the oth...
متن کاملGraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster ...
متن کاملAn Innovative Approach to RDF Graph Data Processing : Spark with GraphX and Scala
The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted to various “big data” problems. Query processing is one of them and needs to be efficiently addressed with executions over scalable, highly available and fault tolerant frameworks. Data management systems requiring these proper...
متن کاملS2X: Graph-Parallel Querying of RDF with GraphX
RDF has constantly gained attention for data publishing due to its flexible data model, raising the need for distributed querying. However, existing approaches using general-purpose cluster frameworks employ a record-oriented perception of RDF ignoring its inherent graph-like structure. Recently, GraphX was published as a graph abstraction on top of Spark, an in-memory cluster computing system....
متن کامل